Grammar of (statistical) graphics : a tiny overview

Prasanna Bhogale
02-12-2016

with infinite gratitude to Hadley Wickham

Grammar

“the fundamental principles or rules of an art or science”

Grammar + Vocabulary \( \rightarrow \) Communication

Statistical graphics as a list

Types of chart in MS Excel

Clustered column chart, Stacked column chart, 100% stacked column chart, 3-D column chart, Cylinder, cone, and pyramid chart, Line chart, Stacked line chart, 100% stacked line chart, 3-D line chart, Pie chart, Pie of pie or bar of pie chart, Exploded pie chart, Clustered bar chart, Stacked bar chart, 100% stacked bar chart and 100% stacked bar chart in 3-D, Horizontal cylinder, cone, and pyramid chart, Area chart, Stacked area chart, 100% stacked area chart, Scatter chart, Scatter chart with smooth lines and scatter chart with smooth lines and markers, Scatter chart with straight lines and scatter chart with straight lines and markers, Bubble chart, Bubble chart or bubble chart with 3-D effect, High-low-close stock chart, Open-high-low-close stock chart, Volume-high-low-close stock chart, Volume-open-high-low-close stock chart, Doughnut chart, Exploded doughnut chart, Radar chart, Filled radar chart … . .

Elements of the grammar

  • Data

from Wickham 2010

Elements of the grammar

  • Data
  • Mapping data to aesthetics
    • Geometry : points, lines, bars, colours, shapes…
    • Coordinates system : position on plane

\( \rightarrow \)

from Wickham 2010

Elements of the grammar

  • Data
  • Mapping data to aesthetics
  • Mapping aesthetics to display

\( \rightarrow \) \( \rightarrow \)

from Wickham 2010

Elements of the grammar

  • Data \( \rightarrow \) aesthetics \( \rightarrow \) display
  • Scales : displaying relationship of position to data

\( \rightarrow \) \( \rightarrow \)

from Wickham 2010

A language instead of a list

bars + geom_bar()

plot of chunk unnamed-chunk-3

A language instead of a list

bars + geom_bar(width=1) + coord_polar(theta='y')

plot of chunk unnamed-chunk-4

A language instead of a list

bars + geom_bar(width=1) + coord_polar(theta='x')

plot of chunk unnamed-chunk-5

Building a complex graphic

A retreat from Russia

troops <- read.csv("minard-troops.csv")
head(troops)
  long  lat survivors direction group
1 24.0 54.9    340000         A     1
2 24.5 55.0    340000         A     1
3 25.5 54.5    340000         A     1
4 26.0 54.7    320000         A     1
5 27.0 54.8    300000         A     1
6 28.0 54.9    280000         A     1

Napoleon in Russia : path

ggplot(troops) + geom_path(aes(x=long,y=lat), size=2) + theme_fivethirtyeight()

plot of chunk unnamed-chunk-9

Napoleon in Russia : attack and retreat

ggplot(troops) + geom_path(aes(x=long,y=lat, colour=direction, group=group), size=2) + theme_fivethirtyeight()

plot of chunk unnamed-chunk-10

Napoleon in Russia : a defeated army

ggplot(troops) + geom_path(aes(x=long,y=lat, colour=direction, group=group, size=(survivors))) + theme_fivethirtyeight()

plot of chunk unnamed-chunk-11

Grammar + Vocabulary = Power

Most modern data visualization systems use the grammar of graphics.

Converting legacy statistical visualizations to modern, java script and grammar based graphics is a lucrative industry.

ToDo

  • Think in the grammar
  • Learn the vocabulary (JavaScript, d3.js)
  • Write visual poetry !